Learning binaural spectrogram features for azimuthal speaker localization
نویسنده
چکیده
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.
منابع مشابه
Learning Binaural Spectrogram Features for Azimuthal Speaker Localization
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to ...
متن کاملEfficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation
To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial ...
متن کاملUnsupervised feature learning on monaural DOA estimation using convolutional deep belief networks
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. Additionally, in the field of sound direction-of-arrival (DOA) estimation, the binaural features like interaural time or phase difference and interaural level difference, or monaural cues like spectral peaks and notches are often used to estimate soun...
متن کاملLocalization dominance in the median-sagittal plane: effect of stimulus duration.
Localization dominance is an aspect of the precedence effect (PE) in which the leading source dominates the perceived location of a simulated echo (lagging source). It is known to be robust in the horizontal/azimuthal dimension, where binaural cues dominate localization. However, little is known about localization dominance in conditions that minimize binaural cues, and most models of precedenc...
متن کاملTesting the Use of the Binaural Cross-Correlation Coeffiecnt in Azimuthal Sound Localization
Azimuthal sound localization studies were performed on 4 listeners for sounds recorded in two different rooms. The sounds had multiple values of binaural coherence ranging from 0.2 to 0.8 in each room. The sounds were presented to the listener, followed by the same sound with a slight delay in either ear to create an interaural time difference. Previous studies performed with synthetically corr...
متن کامل